56 research outputs found
Curriculum Learning for Handwritten Text Line Recognition
Recurrent Neural Networks (RNN) have recently achieved the best performance
in off-line Handwriting Text Recognition. At the same time, learning RNN by
gradient descent leads to slow convergence, and training times are particularly
long when the training database consists of full lines of text. In this paper,
we propose an easy way to accelerate stochastic gradient descent in this
set-up, and in the general context of learning to recognize sequences. The
principle is called Curriculum Learning, or shaping. The idea is to first learn
to recognize short sequences before training on all available training
sequences. Experiments on three different handwritten text databases (Rimes,
IAM, OpenHaRT) show that a simple implementation of this strategy can
significantly speed up the training of RNN for Text Recognition, and even
significantly improve performance in some cases
Multiple Document Datasets Pre-training Improves Text Line Detection With Deep Neural Networks
In this paper, we introduce a fully convolutional network for the document
layout analysis task. While state-of-the-art methods are using models
pre-trained on natural scene images, our method Doc-UFCN relies on a U-shaped
model trained from scratch for detecting objects from historical documents. We
consider the line segmentation task and more generally the layout analysis
problem as a pixel-wise classification task then our model outputs a
pixel-labeling of the input images. We show that Doc-UFCN outperforms
state-of-the-art methods on various datasets and also demonstrate that the
pre-trained parts on natural scene images are not required to reach good
results. In addition, we show that pre-training on multiple document datasets
can improve the performances. We evaluate the models using various metrics to
have a fair and complete comparison between the methods
Key-value information extraction from full handwritten pages
We propose a Transformer-based approach for information extraction from
digitized handwritten documents. Our approach combines, in a single model, the
different steps that were so far performed by separate models: feature
extraction, handwriting recognition and named entity recognition. We compare
this integrated approach with traditional two-stage methods that perform
handwriting recognition before named entity recognition, and present results at
different levels: line, paragraph, and page. Our experiments show that
attention-based models are especially interesting when applied on full pages,
as they do not require any prior segmentation step. Finally, we show that they
are able to learn from key-value annotations: a list of important words with
their corresponding named entities. We compare our models to state-of-the-art
methods on three public databases (IAM, ESPOSALLES, and POPP) and outperform
previous performances on all three datasets
IA au service de l\u27indexation des contenus en bibliothèque (L\u27)
Diaporama de l\u27intervention de Christopher Kermorvant dans le cadre de la Biennale du numérique 2023 " Intelligence artificielle : écosystèmes, enjeux, usages
SIMARA: a database for key-value information extraction from full pages
We propose a new database for information extraction from historical
handwritten documents. The corpus includes 5,393 finding aids from six
different series, dating from the 18th-20th centuries. Finding aids are
handwritten documents that contain metadata describing older archives. They are
stored in the National Archives of France and are used by archivists to
identify and find archival documents. Each document is annotated at page-level,
and contains seven fields to retrieve. The localization of each field is not
available in such a way that this dataset encourages research on
segmentation-free systems for information extraction. We propose a model based
on the Transformer architecture trained for end-to-end information extraction
and provide three sets for training, validation and testing, to ensure fair
comparison with future works. The database is freely accessible at
https://zenodo.org/record/7868059
Large-scale genealogical information extraction from handwritten Quebec parish records
This paper presents a complete workflow designed for extracting information from Quebec handwritten parish registers. The acts in these documents contain individual and family information highly valuable for genetic, demographic and social studies of the Quebec population. From an image of parish records, our workflow is able to identify the acts and extract personal information. The workflow is divided into successive steps: page classification, text line detection, handwritten text recognition, named entity recognition and act detection and classification. For all these steps, different machine learning models are compared. Once the information is extracted, validation rules designed by experts are then applied to standardize the extracted information and ensure its consistency with the type of act (birth, marriage and death). This validation step is able to reject records that are considered invalid or merged. The full workflow has been used to process over two million pages of Quebec parish registers from the 19–20th centuries. On a sample comprising 65% of registers, 3.2 million acts were recognized. Verification of the birth and death acts from this sample shows that 74% of them are considered complete and valid. These records will be integrated into the BALSAC database and linked together to recreate family and genealogical relations at large scale
Landscape Analysis for the Specimen Data Refinery
This report reviews the current state-of-the-art applied approaches on automated tools, services and workflows for extracting information from images of natural history specimens and their labels. We consider the potential for repurposing existing tools, including workflow management systems; and areas where more development is required. This paper was written as part of the SYNTHESYS+ project for software development teams and informatics teams working on new software-based approaches to improve mass digitisation of natural history specimens
A Comparison of Noise Reduction Techniques for Robust Speech Recognition
. This report presents the integration of several noise reduction methods into the frontend for speech recognition developed at IDIAP. The chosen methods are : Spectral Subtraction, Cepstral Mean Subtraction and Blind Equalization. These dierent methods are studied from a theoretical point of view. Their implementation is described and they are tested on the Numbers95 speech database. A good noise robustness is obtained by combining two of these methods, like Spectral Subtraction with Cepstral Mean Subtraction or Spectral Subtraction with Blind Equalization. The later combination is found to be more appropriate for real recognition systems since it is frame synchronous. A comparison with Jah-RASTA-PLP is also given. Acknowledgements: The support of the OFES under the grant for the \Speech, Hearing and Recognition" (SPHEAR) project # OFES 970299 is gratefully acknowledged. The work described in this report beneted from fruitful discussions with Chac Mokbel. IDIAP{RR 99-10 1 Content..
- …